Authorship and date classification using syntactic tree features
نویسنده
چکیده
Authorship classification of documents using syntactic features has shown high levels of success; however the complexity in mining syntactic features has generally limited them to a basic feature set (POS tags, rewrite rules). S. Kim et al. [1] have proposed a novel algorithm to generated a set of syntactic features based on frequent subtree patterns of a set of syntactic trees. In this paper, I use a modified version of their algorithm to predict authorship and date of historical texts.
منابع مشابه
Syntactic Stylometry: Using Sentence Structure for Authorship Attribution
Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملیک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری - مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
The rapid development of communication by the Internet and the misuse of the anonymity embedded in the nature of online written documents have led to serious security issues. Anonymous identity of the Internet tools such as emails, blogs, and Web sites have made them target methods of interest for criminal activities. On the other hand, world social and political relations have made a great int...
متن کاملAuthorship Verification based on Syntax Features
Authorship verification is wildly discussed topic at these days. In the authorship verification problem, we are given examples of the writing of an author and are asked to determine if given texts were or were not written by this author. In this paper we present an algorithm using syntactic analysis system SET for verifying authorship of the documents. We propose three variants of two-class mac...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کامل